Here, we attach data and print the first few rows to understand how it looks and what we have access to from the World Management Survey
data <- read.csv("management_and_health_034.csv")
attach(data)
head(data)
hospital_id <int> | management <dbl> | zmanagement <dbl> | lmba <dbl> | mba <dbl> | zami_rate <dbl> | hos_bed <dbl> | hos_lbed <dbl> | hos_fprofit <int> | ||
---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 2.318182 | -0.1367758 | 0.4700036 | 0.6 | 0.3727507 | 772.59 | 6.649748 | 0 | |
2 | 2 | 2.454545 | -0.1593851 | 0.4054651 | 0.5 | 1.1042490 | 367.72 | 5.907322 | 0 | |
3 | 3 | 2.363636 | -0.1273013 | 0.6418539 | 0.9 | -0.2483238 | 760.59 | 6.634095 | 0 | |
4 | 4 | 2.363636 | -0.2350837 | 0.3364722 | 0.4 | 1.2353290 | 808.43 | 6.695094 | 0 | |
5 | 5 | 1.818182 | -0.9032297 | 0.1823216 | 0.2 | 0.2235312 | 1105.00 | 7.007601 | 0 | |
6 | 6 | 2.409091 | -0.0616586 | 0.0000000 | 0.0 | 0.0276812 | 724.25 | 6.585136 | 0 |
Printing some in-depth summary statistics:
describe(data,fast=TRUE)
vars <dbl> | n <dbl> | mean <dbl> | sd <dbl> | median <dbl> | min <dbl> | max <dbl> | ||
---|---|---|---|---|---|---|---|---|
hospital_id | 1 | 453 | 1.001364e+03 | 766.69221839 | 1476.0000000 | 1.000000 | 1930.0000000 | |
management | 2 | 453 | 2.765626e+00 | 0.59430307 | 2.8000000 | 1.050000 | 4.3000000 | |
zmanagement | 3 | 453 | 5.094226e-01 | 0.93059795 | 0.5303775 | -2.163212 | 2.9001230 | |
lmba | 4 | 453 | 2.459330e-01 | 0.20176679 | 0.2231435 | 0.000000 | 0.6931472 | |
mba | 5 | 453 | 3.060927e-01 | 0.28010185 | 0.2500000 | 0.000000 | 1.0000000 | |
zami_rate | 6 | 453 | 1.895272e-02 | 1.01495585 | -0.1038135 | -2.192124 | 4.8339860 | |
hos_bed | 7 | 453 | 3.463373e+02 | 332.85185550 | 245.0000000 | 18.000000 | 3000.0000000 | |
hos_lbed | 8 | 453 | 5.243901e+00 | 4.98654323 | 5.4889380 | -99.000000 | 8.0063680 | |
hos_fprofit | 9 | 453 | 5.960265e-02 | 0.23701092 | 0.0000000 | 0.000000 | 1.0000000 | |
hos_nfprofit | 10 | 453 | 3.200883e-01 | 0.46702598 | 0.0000000 | 0.000000 | 1.0000000 |
Creating Appendix B: Figure 1
Below, this plot illustrates how the quality of management changes based on the country the hospital was surveyed in.
ggplot(data=data, aes(x=country, y=zmanagement, group=country, color=country)) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="country",
y="zmanagement",
group="Country",
title="Management Quality by Country")
Creating Appendix C: Table 1
As an extension of the analysis from the plot above, we will be using simple linear regression to see how related the country the hospital is in to the quality of management. Note all coefficients are very statistically significant, and Brazil is the excluded category represented by the intercept.
summary(lm(zmanagement ~ country))
##
## Call:
## lm(formula = zmanagement ~ country)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.46432 -0.50219 -0.04971 0.50020 2.00772
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.18642 0.07434 -2.508 0.012502 *
## countryCanada 0.63603 0.17499 3.635 0.000311 ***
## countrySweden 0.57471 0.14919 3.852 0.000134 ***
## countryUnited Kingdom 0.52922 0.10209 5.184 3.3e-07 ***
## countryUnited States 1.33023 0.09626 13.818 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7761 on 448 degrees of freedom
## Multiple R-squared: 0.3107, Adjusted R-squared: 0.3045
## F-statistic: 50.47 on 4 and 448 DF, p-value: < 2.2e-16
In contrast to the analysis above, this plot illustrates how the mortality risk changes based on the country the hospital was surveyed in.
ggplot(data=data, aes(x=country, y=zami_rate, group=country, color=country)) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="country",
y="zami_rate",
group="Country",
title="Heart Attack Mortality Rate by Country")
As an extension of the analysis from the plot above, we will be using simple linear regression to see how related the country the hospital is in to the mortality rate of heart attacks. Note none of the coefficients are statistically significant, and Brazil is the excluded category represented by the intercept.
summary(lm(zami_rate ~ country))
##
## Call:
## lm(formula = zami_rate ~ country)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2735 -0.6695 -0.1333 0.5402 4.8548
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.844e-08 9.756e-02 0.000 1.000
## countryCanada -1.177e-08 2.297e-01 0.000 1.000
## countrySweden 5.355e-02 1.958e-01 0.273 0.785
## countryUnited Kingdom 8.140e-02 1.340e-01 0.607 0.544
## countryUnited States -2.083e-02 1.263e-01 -0.165 0.869
##
## Residual standard error: 1.019 on 448 degrees of freedom
## Multiple R-squared: 0.001773, Adjusted R-squared: -0.00714
## F-statistic: 0.1989 on 4 and 448 DF, p-value: 0.9389
Creating Appendix B: Figure 2
Below, this plot illustrates how management quality differs across countries
ggplot(data=data, aes(x=country, y=zmanagement, group=country, color=as.factor(yy))) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="yy",
y="zmanagement",
color="Year of Survey",
title="Year of Survey on Management Quality, by Country")
It is very important to note that in the figure above, only the United Kingdom’s hospitals span two different sample groups (red being 2006 and green being 2009). Therefore, we should investigate this further
Creating Appendix B: Figure 3
Below, this plot illustrates how heart attack mortality rates vary across countries
ggplot(data=data, aes(x=country, y=zami_rate, group=country, color=as.factor(yy))) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="yy",
y="zami_rate",
color="Year of Survey",
title="Year of Survey on Mortality Rate, by Country")
Creating Appendix B: Figure 4
Note that there does not appear to be a statistically significant relationship or different between the two survey years and the quality of management present. Because of this lack of significance, we can feel more confident by not controlling for time as it doesn’t seem necessary. More support of this decision will be presented later in this file.
ggplot() +
geom_point(data=data %>% filter(country == "United Kingdom"),
aes(x=yy, y=zmanagement, color=as.factor(yy))) +
geom_smooth(method="lm",
data=data %>% filter(country == "United Kingdom"),
fill="grey", color="black",
aes(x=yy, y=zmanagement, color="Line of Best Fit")) +
theme_minimal() +
labs(x="yy",
y="zmanagement",
color="Year of Survey",
group="Line of Best Fit",
title="Analysis of Year on Management Quality for UK Hospitals")
## `geom_smooth()` using formula = 'y ~ x'
Creating Appendix B: Figure 5
For robustness sake, we will also be completing the same analysis as the figure above but instead searching for a potential realtionship between year and mortality rate. Because of this lack of significance, we can feel more confident by not controlling for time as it doesn’t seem necessary. More support of this decision will be presented later in this file.
ggplot() +
geom_point(data=data %>% filter(country == "United Kingdom"),
aes(x=yy, y=zami_rate, color=as.factor(yy))) +
geom_smooth(method="lm",
data=data %>% filter(country == "United Kingdom"),
fill="grey", color="black",
aes(x=yy, y=zami_rate, color="Line of Best Fit")) +
theme_minimal() +
labs(x="yy",
y="zami_rate",
color="Year of Survey",
group="Line of Best Fit",
title="Analysis of Year on Mortality Rate for UK Hospitals")
## `geom_smooth()` using formula = 'y ~ x'
Creating Appendix B: Figure 6
We wanted to check if the commuting time to the nearest joint Medical Business school would be a good instrumental variable. Our hypothesis was that a hospital that is closer to a hospital would have a strong supply of good managerial knowledge and talent, so the farther away a hospital is the lower zmanagement would be on average. The plot below shows that trend, leading some evidence to using logcom_ttime as an instrument.
# zmanagement vs logcom_ttime
ggplot() +
geom_point(data=data, aes(x=logcom_ttime, y=zmanagement), color="blue") +
geom_smooth(method="lm",
data=data,
fill="grey", color="black",
aes(x=logcom_ttime, y=zmanagement, color="Line of Best Fit")) +
theme_minimal() +
scale_fill_discrete("blue") +
labs(x="ln(commuting time to nearest joint M-B)",
y="zmanagement",
group="Line of Best Fit",
title="Analysis of Commuting Time to Nearest M-B on Management Quality")
## `geom_smooth()` using formula = 'y ~ x'
Creating Appendix B: Figure 7
We wanted to check if the age of the nearest joint Medical Business school would be a good instrumental variable. Our hypothesis was that a hospital that is older and more established will have a track record of sending graduates to this hospital, accumulating management skill over time, leading to higher management scores. The plot below shows that trend, leading some evidence to using com_lage as an instrument.
#zmanagement vs com_lage
ggplot() +
# we exlude the outliers which are very negative (-99s)
geom_point(data=data %>% filter(com_lage >= 0), aes(x=com_lage, y=zmanagement), color="blue") +
geom_smooth(method="lm",
data=data %>% filter(com_lage >= 0),
fill="grey", color="black",
aes(x=com_lage, y=zmanagement, color="Line of Best Fit")) +
theme_minimal() +
scale_fill_discrete("blue") +
labs(x="ln(age of nearest M-B)",
y="zmanagement",
#color="Year of Survey",
group="Line of Best Fit",
title="Analysis of Age of Nearest M-B on zmanagement")
## `geom_smooth()` using formula = 'y ~ x'
Another variable that we wanted to investigate was the
survey_reliability
because of its believed importance to
our initial hypothesis before going in-depth on the modeling
process.
# EXPLORING SINGLE VARIABLES
# CONTINUOUS
# histogram
ggplot() +
geom_histogram(aes(x=survey_reliability[survey_reliability>=0]))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# scatter
ggplot() +
geom_point(aes(x=survey_reliability[survey_reliability>=0], y=zami_rate[survey_reliability>=0]))
# linear model
summary(lm(zami_rate[survey_reliability>=0] ~ survey_reliability[survey_reliability>=0]))
##
## Call:
## lm(formula = zami_rate[survey_reliability >= 0] ~ survey_reliability[survey_reliability >=
## 0])
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2557 -0.6797 -0.1012 0.5301 4.8538
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 0.18871 0.20089 0.939
## survey_reliability[survey_reliability >= 0] -0.02085 0.02417 -0.863
## Pr(>|t|)
## (Intercept) 0.348
## survey_reliability[survey_reliability >= 0] 0.389
##
## Residual standard error: 1.019 on 445 degrees of freedom
## Multiple R-squared: 0.00167, Adjusted R-squared: -0.0005739
## F-statistic: 0.7442 on 1 and 445 DF, p-value: 0.3888
# CATEGORICAL
table(survey_reliability_miss)
## survey_reliability_miss
## 0 1
## 447 6
data %>%
group_by(survey_reliability_miss) %>%
summarise(mean_zami_rate = mean(zami_rate))
survey_reliability_miss <int> | mean_zami_rate <dbl> | |||
---|---|---|---|---|
0 | 0.02047175 | |||
1 | -0.09421443 |
Overall, we know that the variable of interest Y
will be
zami_rate
and that D
will be
zmanagement
for the reasons mentioned in the report.
Additionally, we need to consider the different controls that we have
access to.
Validation for hos_bed
ggplot(data=data, aes(x=hos_bed, y=zmanagement, group=country, color=country)) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="hos_bed",
y="zmanagement",
group="Country",
title="`hos_bed` versus `zmanagement`, by `country`")
ggplot(data=data, aes(x=hos_bed, y=zami_rate, group=country, color=country)) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="hos_bed",
y="zami_rate",
group="Country",
title="`hos_bed` versus `zami_rate`, by `country`")
Validation for hos_fprofit
(note that jitter is
turned on to show the concentration of points around discrete
values)
ggplot(data=data, aes(x=hos_fprofit, y=zmanagement, group=country, color=country)) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="hos_fprofit",
y="zmanagement",
group="Country",
title="`hos_fprofit` versus `zmanagement`, by `country`")
ggplot(data=data, aes(x=hos_fprofit, y=zami_rate, group=country, color=country)) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="hos_fprofit",
y="zami_rate",
group="Country",
title="`hos_fprofit` versus `zami_rate`, by `country`")
Validation for hos_nfprofit
(note that jitter is
turned on to show the concentration of points around discrete
values)
ggplot(data=data, aes(x=hos_nfprofit, y=zmanagement, group=country, color=country)) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="hos_nfprofit",
y="zmanagement",
group="Country",
title="`hos_nfprofit` versus `zmanagement`, by `country`")
ggplot(data=data, aes(x=hos_nfprofit, y=zami_rate, group=country, color=country)) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="hos_nfprofit",
y="zami_rate",
group="Country",
title="`hos_nfprofit` versus `zami_rate`, by `country`")
Validation for grid_temp_new
(note that jitter is
turned on to show the concentration of points around discrete
values)
ggplot(data=data, aes(x=grid_temp_new, y=zmanagement, group=country, color=country)) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="grid_temp_new",
y="zmanagement",
group="Country",
title="`grid_temp_new` versus `zmanagement`, by `country`")
ggplot(data=data, aes(x=grid_temp_new, y=zami_rate, group=country, color=country)) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="grid_temp_new",
y="zami_rate",
group="Country",
title="`grid_temp_new` versus `zami_rate`, by `country`")
ggplot(data=data, aes(x=grid_temp_new_miss, y=zmanagement, group=country, color=country)) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="grid_temp_new_miss",
y="zmanagement",
group="Country",
title="`grid_temp_new_miss` versus `zmanagement`, by `country`")
ggplot(data=data, aes(x=grid_temp_new_miss, y=zami_rate, group=country, color=country)) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="grid_temp_new_miss",
y="zami_rate",
group="Country",
title="`grid_temp_new_miss` versus `zami_rate`, by `country`")
Validation for survey_reliability
(note that jitter
is turned on to show the concentration of points around discrete
values)
ggplot(data=data, aes(x=survey_reliability, y=zmanagement, group=country, color=country)) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="survey_reliability",
y="zmanagement",
group="Country",
title="`survey_reliability` versus `zmanagement`, by `country`")
ggplot(data=data, aes(x=survey_reliability, y=zami_rate, group=country, color=country)) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="survey_reliability",
y="zami_rate",
group="Country",
title="`survey_reliability` versus `zami_rate`, by `country`")
ggplot(data=data, aes(x=survey_reliability_miss, y=zmanagement, group=country, color=country)) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="survey_reliability_miss",
y="zmanagement",
group="Country",
title="`survey_reliability_miss` versus `zmanagement`, by `country`")
ggplot(data=data, aes(x=survey_reliability_miss, y=zami_rate, group=country, color=country)) +
geom_point() +
geom_jitter() +
theme_minimal() +
labs(x="survey_reliability_miss",
y="zami_rate",
group="Country",
title="`survey_reliability_miss` versus `zami_rate`, by `country`")
IV First Stage and creating Appendix C: Table 3
Using the instruments we picked in the above EDA, we can create the first stage of the IV regression. We predict zmanagement using our instruments logcom_ttime, com_lage, and com_lage_miss, along with all of our controls.
stage_1_lm <- lm(zmanagement ~ logcom_ttime + com_lage + com_lage_miss + as.factor(country) + hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + survey_reliability + survey_reliability_miss + grid_temp_new + grid_temp_new_miss)
summary(stage_1_lm)
##
## Call:
## lm(formula = zmanagement ~ logcom_ttime + com_lage + com_lage_miss +
## as.factor(country) + hos_lbed + hos_fprofit + hos_nfprofit +
## hos_numcompetitors + survey_reliability + survey_reliability_miss +
## grid_temp_new + grid_temp_new_miss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.67523 -0.42399 -0.02025 0.45768 1.94271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.220290 0.380709 -3.205 0.00145 **
## logcom_ttime -0.313476 0.099266 -3.158 0.00170 **
## com_lage -0.007723 0.059524 -0.130 0.89683
## com_lage_miss -0.825983 6.122994 -0.135 0.89275
## as.factor(country)Canada 1.044096 0.247661 4.216 3.03e-05 ***
## as.factor(country)Sweden 0.943704 0.236457 3.991 7.71e-05 ***
## as.factor(country)United Kingdom 0.855469 0.190757 4.485 9.35e-06 ***
## as.factor(country)United States 1.413720 0.144762 9.766 < 2e-16 ***
## hos_lbed -0.005605 0.007157 -0.783 0.43398
## hos_fprofit 0.298490 0.158387 1.885 0.06015 .
## hos_nfprofit 0.208298 0.097679 2.132 0.03352 *
## hos_numcompetitors 0.063952 0.065307 0.979 0.32800
## survey_reliability 0.117466 0.018549 6.333 5.99e-10 ***
## survey_reliability_miss 12.594389 1.996967 6.307 6.99e-10 ***
## grid_temp_new -0.001162 0.010948 -0.106 0.91549
## grid_temp_new_miss -0.317342 1.191559 -0.266 0.79011
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7333 on 437 degrees of freedom
## Multiple R-squared: 0.3997, Adjusted R-squared: 0.3791
## F-statistic: 19.4 on 15 and 437 DF, p-value: < 2.2e-16
Creating Appendix C: Table 4
We can create the second stage by predicting our estimated z_management_hat, and using that instead of zmanagement in the model.
#predict zmanagement_hat for our data using the first stage
zmanagement_hat <- predict(stage_1_lm)
iv_stage_2_lm <- lm(zami_rate ~ zmanagement_hat + as.factor(country) + hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + survey_reliability + survey_reliability_miss + grid_temp_new + grid_temp_new_miss)
summary(iv_stage_2_lm)
##
## Call:
## lm(formula = zami_rate ~ zmanagement_hat + as.factor(country) +
## hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors +
## survey_reliability + survey_reliability_miss + grid_temp_new +
## grid_temp_new_miss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1208 -0.6572 -0.1018 0.5040 4.5676
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.275233 0.768468 -1.659 0.0977 .
## zmanagement_hat -1.043021 0.434819 -2.399 0.0169 *
## as.factor(country)Canada 0.885774 0.537968 1.647 0.1004
## as.factor(country)Sweden 0.873678 0.515995 1.693 0.0911 .
## as.factor(country)United Kingdom 0.946296 0.475245 1.991 0.0471 *
## as.factor(country)United States 1.524804 0.663373 2.299 0.0220 *
## hos_lbed 0.005460 0.009933 0.550 0.5828
## hos_fprofit 0.265066 0.264416 1.002 0.3167
## hos_nfprofit -0.075137 0.164739 -0.456 0.6485
## hos_numcompetitors -0.017080 0.100553 -0.170 0.8652
## survey_reliability 0.104658 0.056938 1.838 0.0667 .
## survey_reliability_miss 10.998751 6.108416 1.801 0.0725 .
## grid_temp_new 0.007524 0.015033 0.500 0.6170
## grid_temp_new_miss 0.330931 1.643613 0.201 0.8405
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.008 on 439 degrees of freedom
## Multiple R-squared: 0.04157, Adjusted R-squared: 0.01319
## F-statistic: 1.465 on 13 and 439 DF, p-value: 0.1271
We also want to check to see if we did the above IV regression correctly by using the ivreg library to check to see if they give the same results. They match coefficients, but have different p-values. We trust the ivreg library more since it is intended to do this instrumental variable regression, so we created Appendix C: Table 4 using this summary.
reg_iv <- ivreg(zami_rate ~ zmanagement + as.factor(country) + hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + survey_reliability + survey_reliability_miss + grid_temp_new + grid_temp_new_miss
| logcom_ttime + com_lage + com_lage_miss + as.factor(country) + hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + survey_reliability + survey_reliability_miss + grid_temp_new + grid_temp_new_miss,
data=data)
summary(reg_iv)
##
## Call:
## ivreg(formula = zami_rate ~ zmanagement + as.factor(country) +
## hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors +
## survey_reliability + survey_reliability_miss + grid_temp_new +
## grid_temp_new_miss | logcom_ttime + com_lage + com_lage_miss +
## as.factor(country) + hos_lbed + hos_fprofit + hos_nfprofit +
## hos_numcompetitors + survey_reliability + survey_reliability_miss +
## grid_temp_new + grid_temp_new_miss, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.34884 -0.77235 -0.05527 0.70311 5.04501
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.275233 0.907345 -1.405 0.1606
## zmanagement -1.043021 0.513398 -2.032 0.0428 *
## as.factor(country)Canada 0.885774 0.635189 1.395 0.1639
## as.factor(country)Sweden 0.873678 0.609245 1.434 0.1523
## as.factor(country)United Kingdom 0.946296 0.561131 1.686 0.0924 .
## as.factor(country)United States 1.524804 0.783257 1.947 0.0522 .
## hos_lbed 0.005460 0.011728 0.466 0.6418
## hos_fprofit 0.265066 0.312200 0.849 0.3963
## hos_nfprofit -0.075137 0.194510 -0.386 0.6995
## hos_numcompetitors -0.017080 0.118724 -0.144 0.8857
## survey_reliability 0.104658 0.067228 1.557 0.1202
## survey_reliability_miss 10.998751 7.212318 1.525 0.1280
## grid_temp_new 0.007524 0.017750 0.424 0.6719
## grid_temp_new_miss 0.330931 1.940643 0.171 0.8647
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 3 437 3.333 0.0194 *
## Wu-Hausman 1 438 4.055 0.0447 *
## Sargan 2 NA 4.685 0.0961 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.19 on 439 degrees of freedom
## Multiple R-Squared: -0.3361, Adjusted R-squared: -0.3757
## Wald test: 1.051 on 13 and 439 DF, p-value: 0.4013
As referenced in the model, a lot of the motivation behind using country fixed effects is explained in the EDA section of this file. Here, we will focus on the modeling.
Creating Appendix C: Table 2, this is the final model that was chosen to estimate the effect of management quality on heart attack mortality rates by using fixed-effects.
summary(plm(zami_rate ~ zmanagement + hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + survey_reliability + survey_reliability_miss + grid_temp_new + grid_temp_new_miss, model="within", index="country", data=data))
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = zami_rate ~ zmanagement + hos_lbed + hos_fprofit +
## hos_nfprofit + hos_numcompetitors + survey_reliability +
## survey_reliability_miss + grid_temp_new + grid_temp_new_miss,
## data = data, model = "within", index = "country")
##
## Unbalanced Panel: n = 5, T = 24-161, N = 453
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.39234 -0.64847 -0.10465 0.53967 4.70908
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## zmanagement -0.1823892 0.0648790 -2.8112 0.005156 **
## hos_lbed 0.0086901 0.0097777 0.8888 0.374613
## hos_fprofit -0.0350270 0.2172882 -0.1612 0.872009
## hos_nfprofit -0.2664809 0.1338493 -1.9909 0.047111 *
## hos_numcompetitors -0.1194097 0.0863806 -1.3824 0.167562
## survey_reliability 0.0038517 0.0265082 0.1453 0.884538
## survey_reliability_miss 0.1929170 2.8521172 0.0676 0.946103
## grid_temp_new 0.0094896 0.0149646 0.6341 0.526324
## grid_temp_new_miss 0.6965218 1.6295025 0.4274 0.669265
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 464.8
## Residual Sum of Squares: 444.12
## R-Squared: 0.044487
## Adj. R-Squared: 0.016191
## F-statistic: 2.27098 on 9 and 439 DF, p-value: 0.017093
As mentioned in the model, we will include yy06
and then
yy09
to illustrate the decreased certainty from the model
when adding in these controls.
yy06
summary(plm(zami_rate ~ zmanagement + yy06 + hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + survey_reliability + survey_reliability_miss + grid_temp_new + grid_temp_new_miss, model="within", index="country", data=data))
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = zami_rate ~ zmanagement + yy06 + hos_lbed + hos_fprofit +
## hos_nfprofit + hos_numcompetitors + survey_reliability +
## survey_reliability_miss + grid_temp_new + grid_temp_new_miss,
## data = data, model = "within", index = "country")
##
## Unbalanced Panel: n = 5, T = 24-161, N = 453
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.44562 -0.64589 -0.10919 0.54880 4.70558
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## zmanagement -0.1796177 0.0650326 -2.7620 0.005987 **
## yy06 0.1331070 0.1872783 0.7107 0.477621
## hos_lbed 0.0086568 0.0097833 0.8849 0.376720
## hos_fprofit -0.0363793 0.2174191 -0.1673 0.867193
## hos_nfprofit -0.2672596 0.1339293 -1.9955 0.046604 *
## hos_numcompetitors -0.1194632 0.0864294 -1.3822 0.167613
## survey_reliability 0.0048565 0.0265608 0.1828 0.855004
## survey_reliability_miss 0.2672526 2.8556419 0.0936 0.925480
## grid_temp_new 0.0094309 0.0149733 0.6298 0.529123
## grid_temp_new_miss 0.6908478 1.6304412 0.4237 0.671979
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 464.8
## Residual Sum of Squares: 443.61
## R-Squared: 0.045587
## Adj. R-Squared: 0.015081
## F-statistic: 2.0921 on 10 and 438 DF, p-value: 0.023888
yy06
summary(plm(zami_rate ~ zmanagement + yy09 + hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + survey_reliability + survey_reliability_miss + grid_temp_new + grid_temp_new_miss, model="within", index="country", data=data))
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = zami_rate ~ zmanagement + yy09 + hos_lbed + hos_fprofit +
## hos_nfprofit + hos_numcompetitors + survey_reliability +
## survey_reliability_miss + grid_temp_new + grid_temp_new_miss,
## data = data, model = "within", index = "country")
##
## Unbalanced Panel: n = 5, T = 24-161, N = 453
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.44562 -0.64589 -0.10919 0.54880 4.70558
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## zmanagement -0.1796177 0.0650326 -2.7620 0.005987 **
## yy09 -0.1331070 0.1872783 -0.7107 0.477621
## hos_lbed 0.0086568 0.0097833 0.8849 0.376720
## hos_fprofit -0.0363793 0.2174191 -0.1673 0.867193
## hos_nfprofit -0.2672596 0.1339293 -1.9955 0.046604 *
## hos_numcompetitors -0.1194632 0.0864294 -1.3822 0.167613
## survey_reliability 0.0048565 0.0265608 0.1828 0.855004
## survey_reliability_miss 0.2672526 2.8556419 0.0936 0.925480
## grid_temp_new 0.0094309 0.0149733 0.6298 0.529123
## grid_temp_new_miss 0.6908478 1.6304412 0.4237 0.671979
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 464.8
## Residual Sum of Squares: 443.61
## R-Squared: 0.045587
## Adj. R-Squared: 0.015081
## F-statistic: 2.0921 on 10 and 438 DF, p-value: 0.023888
Alternatively, we will compare the exact same controls as our final
FE model but instead use the region_survey
fixed-effects
and display the reduced certainty it brings into our model.
summary(plm(zami_rate ~ zmanagement + hos_lbed + hos_fprofit + hos_nfprofit + hos_numcompetitors + survey_reliability + survey_reliability_miss + grid_temp_new + grid_temp_new_miss, model="within", index="region_survey", data=data))
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = zami_rate ~ zmanagement + hos_lbed + hos_fprofit +
## hos_nfprofit + hos_numcompetitors + survey_reliability +
## survey_reliability_miss + grid_temp_new + grid_temp_new_miss,
## data = data, model = "within", index = "region_survey")
##
## Unbalanced Panel: n = 75, T = 1-39, N = 453
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.5933593 -0.5940206 -0.0026787 0.5082217 3.1061505
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## zmanagement -0.1720557 0.0691502 -2.4881 0.01328 *
## hos_lbed 0.0030032 0.0115874 0.2592 0.79564
## hos_fprofit 0.0699421 0.2527916 0.2767 0.78218
## hos_nfprofit -0.1236863 0.1568130 -0.7888 0.43076
## hos_numcompetitors -0.1487774 0.0980047 -1.5181 0.12985
## survey_reliability 0.0083172 0.0285024 0.2918 0.77060
## survey_reliability_miss 0.8708110 3.0858147 0.2822 0.77795
## grid_temp_new -0.0076748 0.0294098 -0.2610 0.79427
## grid_temp_new_miss -1.1438812 3.1884373 -0.3588 0.71998
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 374.87
## Residual Sum of Squares: 362.93
## R-Squared: 0.031843
## Adj. R-Squared: -0.18593
## F-statistic: 1.34848 on 9 and 369 DF, p-value: 0.2103
Through this script, we hope you are able to gain more insight into our modeling process, the technicality of the work we were able to complete, and the cautionary steps that we took in order to try our best and accurately model the causal relationship that we are interested in.